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ABSTRACT 



When the same parameters are estimated by data from several 
independent samples, it may happen that, for any pair of samples, even though 
the test for parameter discrepancy is statistically significant, the two 
individual confidence intervals overlap. To overcome this potential 
contradiction, a new type of one -sample confidence intervals is developed. 
Their evaluation will lead to the same statistical decisions reached by the 
two- sample test for parameter discrepancy. Moreover, the simultaneous 
decisions on parameter estimation, statistical inference, and directional 
prediction can be made with specified confidence coefficients and error rates 
by simply comparing a pair of comparable confidence intervals.. In contrast 
with conventional confidence intervals, the comparable new confidence 
intervals have narrower widths, disjoint or overlap depending on whether the 
parameter discrepancy is statistically significant or not. The proposed 
procedure can be applied to both simple and multiple a-priori comparisons of 
means, proportions, and correlation coefficients. Due to its mathematical 
simplicity, the method should be valuable for research practitioners and 
quite suitable to be taught in courses of research methods in the behavioral 
and social sciences. An appendix explains the derivation of the formulas in 
Table 1. (Contains 3 tables and 49 references.) (Author/SLD) 
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Comparable Confidence Intervals for Multi-sample and Repication Studies 

Abstract 

When the same parameters are estimated by data from several independent samples, it may 
happen that, for any pair of samples, even though the test for parameter discrepancy is 
statistically significant, the two individual confidence intervals overlap. To overcome this 
potential contradiction, a new type of one-sample confidence intervals is developed. Their 
evaluation will lead to the same statistical decisions reached by the two-sample test for 
parameter discrepancy. Moreover, the simultaneous decisions on parameter estimation, 
statistical inference and directional prediction can be made with specified confidence 
coefficients and error rates by simply comparing a pair of comparable confidence intervals. 
Contrasting to the corresponding conventional confidence intervals, the comparable confidence 
intervals have narrower widths, disjoint or overlap depending on whether the parameter 
discrepancy is statistically significant or not. The proposed procedure can be applied to both 
simple and multiple a-priori comparisons of means, proportions and correlation coefficients. 
Due to its mathematical simplicity, the method should be valuable for research practitioners 
and quite suitable to be taught in courses of research methods in the behavioral and social 



sciences. 
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Comparable Confidence Intervals for Multi-sample and Repication Studies 



There are several circumstances in which the comparison of individual confidence 
intervals (for (]);, i = = 1, 2, ^ across a series of independent samples is needed. The 

comparison may be conducted jointly with the evaluation of the associated significance tests 
(Hn ! (])( = (t)o, i = 1,2, ^ or confidence intervals of the parameter discrepancy (5, where 5 

= <t>i - i 5^ i’ = 1, 2, ..., First, the comparison of confidence intervals of the true 
parameter, associated with the same hypotheses but obtained under various research 
conditions (e.g., with different sample sizes and sample variances in multi-sample studies) 
will help identifying not only the statistically significant results but possibly also the 
practically, or clinically, important hypothetical conjectures. Usually such comparisons can be 
performed by means of simple bar or line graphs. For example, the graph contains confidence 
intervals drawn horizontally one on top of another and a vertical line representing the 
hypothetical value of the parameter. The confidence interval that is away furthest from the 
vertical line in the predicted direction may be chosen to indicate the range of both statistically 
and clinically significant effect (Borenstein, 1994). Secondly, sometimes confidence intervals 
may be more informative than statistical tests in the evaluation and comparison of the 
statistical results. For example, a student’s performance in a national standards test of 
Mathematics is deemed unsatisfactory. One would reach such a conclusion more convincingly 
if it can be shown that the two confidence intervals for the individual and national true means 
are separable, namely, even the upper bound of the former falls below the lower bound of the 
latter. Clearly, for this type of single-subject analysis, visual methods such as the comparison 
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of confidence intervals are as necessary as, if not more meaningful than, the £-value or an 
index of the statistical power of the test. Thirdly, if the research objective is to estimate the 
typical range of the parameter of interest on the basis of multi-sample studies then confidence 
intervals of the parameter discrepancy may not be relevant. One comparing individual 
intervals for the parameter itself will identify the extent of sampling fluctuations and, as a 
result, obtain a more precise estimate of the true parameter range. This is especially fitting if 
estimates of the parameter discrepancy are found statistically significant. Hsu (1994) gives the 
following example, "two data sets may give rise to the two confidence intervals pj e 1 + 0.2 
months and pj e 10 + 2 months, which convey very different information about p, yet the 
same £-value associated with Hjo" (p. 4), where Hjo: Pj = 0. Note that the test of means 
difference and the two individual mean tests may be all statistically significant (possibly at 
approximately the same £ values), but the two individual confidence intervals have very 
different widths. Depending on the units of measurement and what the parameter (p) 
represents, the researcher would prefer one but not the other confidence interval for the 
estimate of the parameter range. Considerations as such are often overlooked if one computes 
only the test of means difference'. Moreover, the computation of the significance test is not 
necessary if one wants to estimate the £-value of the test statistic and the power of the test. It 
will be shown that, given the information of a confidence interval, it is possible to recover the 
p-value of the associated test statistic, and the power of the test is the same as the power of 
the confidence interval. Last but not least, the confidence interval can be used to identify the 
directions of the parameter and the parameter discrepancy. For the test of Hjo: <t)i = 0, if the 
confidence interval for (t); is entirely to the left of 0 then (t)i < 0 whereas if it is completely to 
the right of 0 then (j); > 0. Following the procedures of two-tailed directional tests (Kaiser, 
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1960; Shaffer, 1972; and Leventhal and Huynh, 1996a , 1996b), one can compute the risk of 
making directional decisions (Type III error rate). 

But how can confidence intervals computed for different samples be compared? It has 
been recognized that, within each sample, the same statistical decision can be reached by 
either conducting a statistical test (e.g., an evaluation of the 2 -value of the test statistic) or by 
examining if the hypothetical value of the parameter falls within the corresponding confidence 
limits. The match between statistical tests and confidence intervals for statistical inference is 
desirable and plays a major role for advocating the use of confidence intervals (Natrella, 

1960). However, the comparison of the individual confidence intervals obtained for the single 
parameters, say means, may or may not reproduce the same statistical decisions on the 
statistical significance of means difference across different samples . This is because, for any 
pair of means, the tests for individual means and means discrepancy are based on different 
estimates of the standard errors, and on t statistics with different degrees of freedom if the 
population variances are unknown. 

There is a growing interest in confidence intervals among applied researchers^. It is 
expected that confidence intervals will be the standard method for statistical inference in 
social and behavioral sciences. Although statistical decisions based on post-hoc multiple 
comparisons of confidence intervals for the means have been discussed, there is still a need 
for a systematic study on the procedures and conditions for comparing confidence intervals 
associated with pre-planned tests of means, proportions and correlations^. But the methods for 
evaluating confidence intervals are lacking. It is the purpose of this paper to fulfill this need. 
Moreover, it will be argued that the statistical decisions based on comparable confidence 
intervals will satisfy the golden rule of equivalent outcomes between hypothesis testing and 



i 



Comparable Confidence Intervals 6 

confidence interval evaluation, a hurdle that prevents the comparison of the conventional 
confidence intervals unless very stringent conditions are met (e.g., same sample sizes and 
population variances). The methods to compute and evaluate comparable confidence intervals 
will be discussed in the first three sections. Readers unconcerned with the methodological 
development but want a quick overview of the utility of the proposed procedures may be 
benefitted by reading the examples at the end of these sections before moving to the bulk of 
the article. Some related issues in the application of the proposed procedure and general 
conclusions will be drawn in the final section. 

Background of the Study 

The Problem 

In the following discussion, all tests are two-tailed, evaluated at a, or the nominal 
significance level, so that oc/2 is the size of each tailed critical region. Consider k independent 
populations characterized by the parameters <t)i, i = 1, ..., k, for k > 2, where (j) may denote 
the mean (p), proportion (7i) or correlation coefficient (p). The development of the proposed 
method is based on the following "equivalency principle": 

If Hq: <t)i = <t)i' is rejected at a in favor of H^: (t); ^ (tij. for any i i’ = 1, ..., k, 
then the individual 100(1 - a)% two-tailed confidence intervals for Hq: ([)) = (jio 
and Hq: ([)). = (jio, where (jio be a hypothetical value of (j), should be separable, or 
nonoverlapping. On the other hand, if Hq: ([)) = <t)i’ is conceded then the 
individual confidence intervals for <t)i, i i’ = 1, k conducted at the same 
level of a, overlap. Two confidence intervals are said to overlap if the upper 
(lower) bound of the confidence interval for the smaller (larger) parameter 

ERIC 
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Comparable Confidence Intervals 7 
estimate is located inside the confidence interval for the larger (smaller) 
parameter estimate. Otherwise, the confidence intervals are considered 
separable. 

For simplicity, the following discussion is based on the tests of means in two-sample 
studies under the assumption of variance homogeneity (a^j = a^, i = 1,2). Let Mj be the 
sample mean of the ith independent population with the corresponding sample size nj, i = 1, 

2, not necessarily equal. The 100(1 - a)% two-tailed individual confidence intervals for 
testing Ho j.' pj = Po (i = 1, 2) are computed by the conventional method as 

(1) Cl,: M, + Z..„,,SE„i= 1,2, 

(two one- sample confidence intervals), where ^ = a/Vn^, i = 1, 2 and ’L\.aji = the (1 - 
ot/2)th quantile of the standard normal distribution (such that - * .Zi-a/ 2 )- Without loss of 
generality, assuming M, > Mj. The corresponding confidence interval associated with the test 
of Ho, a: Pi - P 2 = 0 is 

(2) CL: (M. -M,) + Z,.„„SE., 

(a two-sample confidence interval), where SE., = cW{(l/n|) + (I/il)}- The subscript d 
represents the fact that the means difference is being assessed. If the confidence interval CI^ 
does not contain the value of zero then the null hypothesis Hq ^ is rejected at the 
predetermined significance level a for two-tailed test. Otherwise, if CI^ does not contain zero, 
one fails to reject the null hypothesis Hn n- 

It is possible that the two within-sample confidence intervals overlap even though the 
between-sample test for the parameter discrepancy is statistically significant at a. The 
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question of interest is how the individual confidence intervals in Equation 1 be compared so 
that the same statistical decision obtained for Ho j can be reached according to the 
"equivalency principle" stated above. 

A Solution 

Let Oj u, and ^ l> 1 = 2, denote the upper and lower bounds of the individual 

confidence intervals, respectively, and £j, i = 1,2, be any small, positive constant. Since M, > 
M, . Cl, ,, lies to the right of CI 2 y. If Ho,d is rejected at a/2 then the lower bound of O, is at 
least equal to the smaller mean (Mo ) so that M, - Z, . a/ 2 a/Vn, = M 2 + £,, or 

(3) (Ml - M 2 ) - Zi . a/ 2 <y/Vni = £„ 

and the upper bound of CI 2 is at most equal to the larger mean (Mi) so that M 2 + Z, . 

= Ml - £2- or 

(4) (Ml - M 2 ) - z, . = £ 2 - 

Conditions (3) and (4) imply that when the test of means difference is statistically significant, 
the lower bound of the 100(1 - a)% confidence interval computed from the individual 
samples should be 

OcL (Ml - M 2 ) - Z, . ^ 2 <^{(lNn,) + ( 1 /Vri 2 )}. 

The subscript c stands for the correction on the conventional one-sample confidence intervals. 
By symmetry, it is easy to write the equation for its upper bound. In other words, if the 
difference of the two individual confidence intervals in Equation 1 is computed, the resulting 
100(1 - a)% two-tailed confidence interval for the means difference would be specified as 
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(5) Cl: (M. -M,) + Z,.„„SE.. 

where SE^ = + SE^ = a{(l/Vni) + (l/Vrij)}. The width of the confidence interval in 

Equation 5 is larger than that in Equation 2 since {(1/Vni) + (l/Vrij)} > V{(l/n,) + (l/rij)} for 
any positive constant iij, i = 1, 2. This explains why the "equivalence principle" can be 
violated. To prevent this possibility, a modification of SE^, i = 1,2, that renders the equality 
SE.. = SE., must be found. As a solution, the critical values in Equation 1 is set to be Z = Z, , 
where c,^ is the correction factor of the form 

SE., V{ni + 112 } 

(6) Cf= = , 

SE.. Vn, + Vrij 

implying c^ < 1. Upon replacing Z, . ^2 by Zj . ^ 2 £f in Equation 1, the conventional confidence 
intervals are revised to yield the following comparable confidence intervals 

(V) ClVMi±Z,.,,2SE*i, 

where SE* ; = CfSE . for i = 1, 2. Clearly, the width of Cl *, is narrower than that of for i = 
1, 2, respectively. To recapitulate, if the means difference is statistically significant at oc/2 
then the comparable confidence intervals in Equation 7 are separable. This can be 
accomplished even if the two 100(1- a)% two-tailed confidence intervals in Equation 1 
overlap. 

An Alternative Solution 

For the purpose of preserving the "equivalency principle", instead of adjusting the 
standard error of the estimates in computing SE* |, i_ = 1, 2, as above, one can modify the 
nominal significance level and revise the confidence coefficient of the conventional intervals 
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accordingly. The formula for computing a Type I probability corresponding to a 100(1 - a)% 



conventional confidence interval is given by 



( 8 ) 



a = 2[l -0(Z,.^2)] 



;i - a/2^J> 



where O()0 = Pr(X < >0 = the probability of obtaining a standard normal value x = the area 
under the standard normal curve to the left of the point x. Therefore, the adjusted or 
comparative Type I error (a’), corresponding to the nominal level (a), can be derived as 



This enables the computation of comparative confidence intervals, defined as the 100(1 - 
a’)% two-tailed conventional confidence interval of the form 



Since 0(Z, . ^ > 0 (CfZ , . a is smaller than a’, implying that Z, . „./2 < Z, . ^ 2 ^ the length 
of the 100(1 - a’)% comparative confidence interval (CIM is narrower than that of the 
100(1 - a)% conventional confidence interval (CI; ~) for i = 1,2, respectively. 



Consider two independent samples drawn from a population having a known standard 
deviation of a = 10, the first sample with n, = 36, M, = 25.5 and the second sample with 1 I 2 
= 25 and Mj = 20. The 95% two-tailed individual confidence intervals for testing Pi = Po 
(i = 1, 2), for any value of p^, are given by the standard method as O, = 25.5 + (1.96)(10/6) 
= (22.23, 28.77) and ^ = 20 + (1.96)(10/5) = (16.08, 23.92). These two conventional 



(9) 



a’ = 2[l -0(c,Z,.^)]. 



( 10 ) 



ci’i! Mi 




Example 1 
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confidence intervals overlap from 22.23 to 23.92, or about 26% of the confidence interval for 



? 



p, and 22% of the confidence interval for pj- However, the test of means difference (H 04 : Pi = 
P 2 ) is statistically significant at a = .05 (two-tailed) since its 95% two-tailed confidence 
interval is equal to Ci, = (25.5 - 20) + (1.96)(10)V{(l/36) + (1/25)} = (0.40, 10.60). The 
required correction factor is computed as Cf = V{36 + 25}/(V36 + ^25) = 0.71. Hence, the 
corresponding comparable confidence intervals are O, = 25.5 + (1.96)(0.71)(10/6) = (23.18, 
27.82) and Oj = 20 ± (1.96)(.71)(10/5) = (17.22, 22.78). As expected, the 95% two-tailed 
comparable confidence intervals are separable. The comparative Type I error is a’ = 2[1 - 
0((.71)(1.96))] = 2[1 - 0.91798] = 0.164 and the comparative confidence intervals are Cl’, = 
(24.34, 26.66) and O’j = (18.61, 21.40). Therefore, the following three statistical decisions 
can be made for the given data: (i) the conventional 83.6% two-tailed confidence intervals for 
testing 1 ^,: p, = p,j (i = 1, 2) are disjointed, (ii) the 95% two-tailed comparable confidence 
intervals are separable, and (iii) the test of means difference (H 04 : p, = P 2 ) is statistically 
significant at a = .05 (two-tailed)'*. 

Comments 

Although the evaluation of Cl*, and O’, would yield outcomes that satisfy the 
"equivalency principle" in statistical decisions, the two types of statistical intervals are not 
identical. It is reconunended that the comparable confidence intervals be used at the expense 
of the comparative confidence intervals for a-prior pairwise comparisons. First of all, because 
CI’;’s are computed on the basis of both the correction factor Cf and comparative significance 
level a’ as shown in Equation 9, they may be overadjusted for the statistical significance of 
the between-sample test. Moreover, since a must be set in advance, it is more natural to 
compute the comparable confidence intervals with the (1 - a) coefficient than the comparative 
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confidence intervals with a corrected coefficient (1 - a’). 

The significance of the test statistic is measured by its 2-value. Since the 2 -value is 
the Type I error probability computed on the basis of sample data, its formula is the same as 
Equation 8 upon replacing Zi.q /2 by the test statistic itself (Welsh, 1996). Let ttp denote the 
2 -value for the test of means difference. It can be computed as 



ttp = 2[i -0{(Mi -M2)/mi}]. 



where SE., = <W{(l/ni) + (I/ 22 )}. The 100(1 - ttp)% conventional confidence interval contains 
the value of 0 as its lower bound for Mj > ^ (or its upper bound if M, < ^). It may be 
called the significant confidence interval for estimating the parameter discrepancy and is 
computed as 



a,: (M, - M 2 ) ± Z, . „p/ 2 ^ = (0, 2(M, - M 2 )). 

since Zj . „p /2 = IMi - M 2 I for Mi > Mz- Although the 2 -value of the test statistic is generally 
meaningful, for the simultaneous evaluation of Hqi and I^, the significant confidence interval 
may not be relevant and will not be discussed further in this paper'*. 

Using Comparable Confidence Intervals for 
Parameter Estimation and Hypothesis Testing in Two-sample Studies 

In the following, a procedure for two-sample tests of means, proportions and 
correlation coefficients will be discussed in the context of a simultaneous assessment of 
within-sample and between-sample tests of the same population parameter. The discussion 
will be carried out for three popular types of two-tailed tests; (i) nondirectional tests evaluated 
at oc/2, (ii) nondirectional tests with unequal allocation of Type I error rates, and (iii) 
nondirectional within-sample tests and a directional between-sample test, all are evaluated at oc/2. 
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Nondirectional Two-Tailed Tests with Symmetric Critical Regions 



Procedure 

Suppose the researcher wants to estimate the range of each (t);, i = 1,2, and 
simultaneously, to test the difference between (t)i and (|) 2 . For these purposes, in the following 
procedure, it suffices to evaluate only the one-sample comparable confidence intervals. The 
testing procedure consists of the following steps: 

(a) Specifying the hypotheses: 

(Within-sample tests): Hj,i: (|)i = (|)o vs. H^: (|)i (t)o, i = 1, 2; where (|)o can be any 

value of interest. 

(Between- s ample test): Ho^: (|), - (|)2 = 0 vs. <t)i - <1)2 0. 

(b) Computing the 100(1 - a)% two-tailed comparable confidence intervals (Cl*,): 

CiV 4±CV,.„„SE*„i= 1,2, 

where SB* , = CfSE . CV is the critical value of the test statistic and represents the sample 
estimate of <l)i. 

(c) Making two statistical decisions at the same significance level of oc/2: 

(i) Reject Ho j if the comparable confidence interval 0*j does not contain (lig and 
decide that (|)j is probably not equal to (tiQ. Otherwise, concede Ho j, i = 1, 2 and assume that (])) 
is equal to (tiQ. 

(ii) Reject Ho j if the comparable confidence intervals Q*j and 0^*2 are disjointed and 
decide that the difference in the two estimates of (t)i and (|)2 are statistically significant. 
Otherwise, concede Ho j, i = 1,2, and assume that the difference in the two parameter 
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estimates are not statistically significant. 

For the two-sample tests of means, proportions and correlation coefficients, the general 
form of the correction is found to be 

V{a + b} 

(11) c, = , 

Va + Vb 

where a and b are functions of variances and/or the number of cases. The results are 
summarized in Table 1 and their derivations are given in the Appendix. 

Insert Table 1 about here 

Equation 1 1 can be simplified to be c^ = V{ 1 + r}/(l + Vr) where r = a/b, the values 
of a and b can be defined such that a > b and r > 1 . Since r represents the ratio of sample 
variances and/or cases, its values would likely be from 1 to 10 in most practical research 
situations. In these circumstances, the range of c^ would be from .71 (when r =1) to .80 (when 
r = 10). Table 2 provides the comparative confidence coefficients (1 - a’) for the 
conventional confidence intervals corresponding to the comparable confidence intervals 
computed with the nominal a levels for Z and t tests under the assumption of variance 
homogeneity. For example, if is statistically significant then, given a correction factor of 
.70 from a Z distribution, both the 95% comparable confidence interval and the 83% 
conventional confidence interval have separable limits. Given a correction factor of .76, if I^ j 
is statistically significant then an 80% conventional (or comparative) confidence interval and 
the 90% comparable confidence intervals for the t test with 10 degrees of freedom have 
disjointed confidence bounds. 
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Insert Table 2 about here 

Example 2 

Two samples are drawn randomly from a population having unknown variance (with 
n, = 31, Ml = 28 and S^, = 144; and n^ = 16, M 2 = 20 and 8^2 ~ 62, respectively). Suppose 
the test of variance homogeneity (O’Brien, 1981) is statistically significant at a = .05 (This is 
the Case Id in the Appendix). The 95% two-tailed individual confidence intervals for testing 
Hqi; Pi = Po (i = 1, 2), for any value of Po, are given by the standard method as O, = (23.60, 
32.40) and O 2 = (15.80, 24.20). The overlap from 23.60 to 24.20 represents nearly 7% of the 
interval length of either O, or However, the test of means difference (Hn^: Pi = P 2 ) is 
statistically significant at a = .05 (two-tailed) according to the 95% two-tailed confidence 
interval for the mean discrepancy (5) = (2-11, 13.89). In computing for f* = 42.20 

(the degree of freedom according to Satterwaite-Welch approximation for t tests), the critical 
value of tj, I . o (/2 is found to be 2.02 (say, by using the command TINV(.025, 42.20) in the 
SAS computer program, SAS Institute Inc., 1990). The required correction factor is c^ = 0.71 
and the adjusted within-sample standard errors are SE* , = 1.53 and SE *-> = 1.39. Hence, the 
corresponding comparable confidence intervals are O*, = (24.88, 31.12) and CI % = (17.03, 
22.97). As expected, the 95% two-tailed comparable confidence intervals are separable. The 
comparative Type I error rate is a’ = 2[1 - Tf*((.71)(2.018))] = 2[1 - 0.920] = 0.16 and the 
associated comparative confidence intervals are also disjointed, being CF, = (25.83, 30.17) 
and ai 2 = (17.98, 22 . 02 ). 
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Nondirectional Two-Tailed Tests with Asymmetric Critical Regions 



Procedure 

Suppose the researcher wants to test the difference between (t)i and (])2 and to estimate 
the range of (|)j at different confidence coefficients, say (1 - tti) and (1 - Oj). The relevant 
hypotheses are specified below. 

(Within-sample tests): 

Ho i: (t)i = (t)o vs. H^: (t)i (t)o, evaluated at a/2, 

Ho, 2 : = <t)o vs. H^: (t)i ^ (t)o, evaluated at 0 ^/ 2 , 

(Between-sample test): Hq^: (t)i - (1)2 = 0 vs. (1 )i - (1)2 ^ 0. evaluated at F/2. 

In this case, the significance levels of tti and OL 2 are predetermined but F is a function 
of tti and 02 - We now show how a value of F can be determined. Following the same 
argument leading to Equation 5, the resulting confidence interval for the parameter difference 
is of the form 

(12) CI„ : ($1 - $ 2 ) + Zi.1 - r nSE,. = ($1 - $ 2 ) ii (2> 

where Q_ = [Zi. oci/ 2 VjQ 2 + Zi. a 2 / 2 ^ni]/VN, N = rii + jq 2 . Hence, the required Type I error rate 
for the between-sample test becomes 

(13) F = 2[1 - 0(Q)] 

where the function 0(x) has been defined previously. From equation 13, ^ is the inverse 
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Comparable Confidence Intervals 17 
function of the standard normal distribution evaluated at (1 - r/2), i..e, Q = 0‘‘(1 - F/2) = Z, . 
r/2- The corresponding comparable confidence intervals are given by 

( 14) Cl*,: + Z, . ± 

where SE* ; = CfSE^ . for i = 1, 2, are the same as derived for symmetric confidence intervals. 

Alternatively, for these sample sizes, by declaring significance whenever a 100(1 - 
a[)% conventional confidence interval for Pi does not overlap a 100(1 - 062)% confidence 
interval for P2. one decides that the two-sample test of p, - P2 is statistically significant at f/2. 
Otherwise, by declaring these two one-sample confidence intervals overlap, one concedes the 
null hypothesis of Pi - P2 ^1 T/2. The comparative Type I error rate and the related 
comparative confidence intervals are 



and, 

respectively. 



a’ = 2[1 - q)(CfO)1, 



Example 3 

For the data in Example 1, the correction factor is computed to be c^ = .071. Suppose 
a, = .05 and 0C2 = .10 then Q_ = [(1.96)V25 -t- (1.645)V36]/V{36 -t- 25} = 2.52 and f = 2[1 - 
0(2.52)] = 2(1 - .994) = .012. The 98.8% two-tailed confidence interval for the difference 
scores is (25.5 - 20) + (2.52)(10)V{(l/36) + (1/25)) = (-1.06, 12.06), implying that the 
means difference is not statistically significant. As expected, the 95% two-tailed comparable 
confidence intervals for the two samples, namely O*,: 25.5 + (2.52)(0.71)(10/6) = (22.52, 
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28.48) and CI^: 20 + (2.52)(0.71)(10/5) = (16.42, 23.58), respectively, are not separable. 
Comments 

The symmetric tests in the previous section can be considered as a special case of the 
asymmetric tests in which a = a, = 0 C 2 - For the asymmetric case, the two within-sample tests 
would have unequal statistical powers with respect to the alternative hypotheses that are 
equivalent in magnitude. This might be desirable if the likelihood to detect the hypothetical 
value ((l)o) in one sample is greater than in the other. 

Directional Comparable Confidence Intervals 

To this point, the individual comparable confidence intervals are used to assess the 
within-sample and between-sample tests simultaneously. These testing procedures are 
nondirectional since they do not provide the answers to, say, the following questions: "Which 
of (|)i (i = 1, 2, ..., ^ is better? What would be the risk in making such a selection?". A 
procedure developed by Kaiser (1960) and modified by Shaffer (1972) and Leventhal and 
Huynh (1996) can be used to address this problem. With the introduction of the directional 
hypotheses for the two-tailed test of means difference, the one-sample comparable confidence 
intervals can be used yet for another purpose, namely, to predict the direction of the 
parameter difference. An important concept in evaluating directional hypotheses is the Type 
ITT error rate (y) or "the risk of getting the direction wrong upon the rejection of the null 
hypothesis". It constitutes a component in the formula for the statistical power. 

Corrected Power: 7i(p)* = 1 - (3 - y. 
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where P and y are the probabilities of making a Type II error and Type III error, 
respectively. Computationally, the Type III error rate is represented by the tailed area opposite 
to the predicted direction under the distribution of the alternative hypothesis. Consider the 
two-sample test of means discrepancy (5 = p, - ^ 2 )- Without loss of generality, suppose 5 > 

0. Let - Sa/SE and Z, = - (Z, + 5.^/SE). where 5^ is the assumed value of the 

mean difference under the alternative hypothesis; 7^ and Zj represent the right and left limits 
of the central range (or region of accepting the null hypothesis) but computed under the 
alternative hypothesis, respectively. For the nondirectional two-tailed test using the Z statistic, 
the conventional power is defined as 

Conventional Power: 7t(p) = 1 - p = 1 + <I>(Zi) - 

(DeGroot, 1975, pp. 404-405; Zehna, 1970, p. 447). Hence, p = <I>(Z 2 ) - OfZj) and y = <I>(Z,) 
so that 

7t(p)* = 7t(p) - Y = 1 - <I>(Z2). 

One may wish to know that the Type III error is always less than oc/2 (Kaiser, 1960, p. 164; 
Leventhal and Huynh, 1996a, p.284)^. 

Procedure 

(a) Specifying the hypotheses: 

The within-sample tests remain the same but the between-sample tests are based on 
three sets of hypotheses: 

(Within-sample tests): Hoj: (j); = (j)o vs. H^: (])i (])o, i = 1, 2; where (])o can be any 

value of interest. 

(Between-sample test): H^i: (!)i - <1)2 < 0. Hdz- - 02 = 0 (null hypothesis), and Hh -,: 
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( I ), - (])2 > 0 

(c) Computing the 100(1 - a)% two-tailed comparable confidence interval (Cl*;) (same 
as under the nondirectional procedure with ot/2) 

(b) Making two statistical decisions at the significance level of ot/2. The first decision 
(i) remains unchanged. The second decision is modified as follows: 

(ii) If the comparable confidence intervals and 0*2 disjointed, reject Hj2 
favor of Hji if < $2 (or reject Hj2 in favor of if > $2. where is the sample 
estimate of (|)i, i = 1, 2) with a Type III error rate (y) less than ot/2. Otherwise, if 0*i and 
0*2 overlap, concede Hq j, i = 1,2, and assume that the difference in the two parameter 
estimates are not statistically significant. 

Suppose the null hypothesis (11^2) of the between-sample test is evaluated at a, where 
a = the probability that at least one of the alternative will be rejected, given that the null 
hypothesi is true (called "the overall significance level" by Shaffer, 1972, p. 196; and 
Leventhal and Huynh, 1996a, p.279). Then, the one-tailed tests Hjj and Hj2 should be 
conducted at oc/2 (or their 100(1 - a)% conventional confidence intervals be evaluated). Since 
only one of the one-tailed test can be statistically significant at oc/2, if any, the researcher can 
decide whether (|), is less than, more than, or equal to (1)2 according to whether H^i, or Hj2> or 
neither, is statistically significant at oc/2. As stated above, the probability of making a mistake 
in deciding the direction is called Type III error (< oc/2). Besides these changes, the 
computation of the one-sample conventional confidence intervals, correction factor, 
comparable confidence intervals, comparative error rates and comparative confidence intervals 
follow the same formulas derived for the nondirectional tests. 
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Example 4 

Recall that in Example 1 the test of means difference (Hj 2 - * <1^2 - 0) is significant 

at ttt = .05 (two-tailed). Had the hypotheses Hdr <l)i - <1)2 < 0 Hd 3 - <t)i • <1)2 > 0 been 
evaluated, the resulting 97.5% one-tailed confidence intervals are 

CIh ,: (p, - P 2 ) e (- 10.60), and OjjiCPi - P 2 ) ^ (0.40, 

Since the confidence interval of 0^,3 <ioes not contain zero, one decides at oc/2 that H 2 is 

rejected in favor of I^, implying that p, > p 2 . Actually, all of these computations are 

unnecessary. Since the comparable confidence intervals of O, = (23.18, 27.82) and CI 2 = 

(17.22, 22.78) are separable, and since M, > one can decide immediately that p, is 

significantly larger than P 2 at a/2 and this conclusion is made with a Type III error 

probability less than 2.5%. Suppose the effect size 6^ = 3 and since ^ = 10 {(1/36) + 

(1/25)} = 2.603, we have 7^ = 1.96 - (3/2.603) = .807, = - (1.96 + (3/2.603)) = - 3.112 so 

that 0)(Z,) = 0)(-3.112) = .000927, OiZ^) = 0)(.807) = .7903 and 

71(6) = 1 - .7903 + .000927 = 0.21063, 

and 

71(6)* = 71(6) - Y = 0.21063 - .00927 = .2097. 

Hence, the estimate of Type III error rate is about .93%. 

Comments 

In this procedure, both the two within-sample tests (Ho j, i = 1,2) and the one-tailed 
tests for mean discrepancy (I^ i, i = 1,2) can have asymmetric regions as long as a, + 0 t 2 = a 
(for E^ j, i = 1,2) and OL^ + aj 2 = T (for H^ j, i = 1,2), where T is a function of a, + 02 as 
shown previously. The same statements of the decision rules will apply with the appropriate 
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changes in the significance levels. In the symmetrical simultaneous tests, the maximum 
probability of Type III error is oc/2. On the other hand, in the asymmetric simultaneous tests, 
it is equal to F/2, where a/2 < F/2 < F < a (Shaffer, 1972). In the extreme asymmetric case, 
if one of the ttj in the test of Hq j is set to zero then the other is set to equal a. At the same 
time, if one of the a^j is set at zero then F is also equal to a. In other words, the within- 
sample and between-sample tests are reduced to two independent one-tailed tests, each is 
evaluated at a). Hence, the symmetric test has an advantage that it minimizes the maximum 
probability of a Type III error. However, it is not necessary and not always best to impose 
symmetric critical regions since it may be more important to detect the effect of one sample, 
or differences in one direction, than in the other (Kaiser, 1960, p. 166; Shaffer, 1972, p. 196). 

For the three-choice hypothesis of H,: Pi - P 2 = 0. S 2 - Fi > F 2 = 0, li,: p, < p 2 . Hand, 
McCarter, and Hand (1985) proposed a procedure for testing the directional two-tailed 
hypothesis by just evaluating the 100(1 - a)% confidence of p, - P 2 . Their decision rules are: 
"(a) if the signs of the two limits are different (zero is in the interval), then refuse to reach 
any conclusion about the population difference, (b) if both signs are positive, then accept H, 
and reject both Hq and H 2 , and (c) if both signs are negative, then accept H 2 and reject both 
^ and H," (p. 495). Certainly, this decision rule can replace our rule (ii) since they imply the 
same statistical outcomes. However, for a simultaneous evaluation of one-sample parameter 
estimation and two-sample statistical inference, the need for comparable confidence intervals 
and hence, our decision rule set of (i) and (ii) are well-grounded. 
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Using Comparable Confidence Intervals for 



Pre-planed Multiple Comparisons 



General Framework 

If there are few a priori contrasts to be tested, the simplest method is the use of 
multiple t tests (Howell, 1997, p. 354). The procedure for simultaneous parameter estimation 
and statistical inference developed above is applicable to testing linear contrasts of means, 
proportions, and correlations under both conditions of variance homogeneity and 
heterogeneity. However, there are three necessary modifications for this purpose. First, under 
the assumption of variance homogeneity in an experiment of k independent groups, the pooled 
variance (S’^) in computing the standard errors of the estimates (SE, . i = 1,2, ..., ^ is 
replaced by the error mean square (MSE) obtained from the one-way ANOVA table for the 
total sample. The formula for MSE is specified as 



(Marascuilo and Serlin, 1988, p. 433), where S^j is the |th sample variance. Secondly, 
summary statistics of the specified contrasts must be computed in terms of weighted values. 
For example, in testing Hq: Pa + b = Mc> the weighted mean and variance for the combined 
group of A and B are 



k 



k 



MSE = S(nj - l)S^j / S(nj - 1), 



(15) 



i=l i=l 



(16) 




and. 
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# f* 



(17) 5. AB “ [(Ba * 1)^ A (Bb ■ 1)S bI^CMab ■ 2) 

where Nab = Ba + BbS Ma Mb represent the sample means, and and S^b- the sample 
variances, of the two groups A and B, respectively. Thirdly, since each contrast represents a 
hypothesis to be tested using the same total sample. Student’s t statistics are replaced by 
Bonferroni-Dunn’s t to control for the familywise error rate (Howell, 1997, p. 362-364; 
Marascuilo and Serlin, 1988, Chapter 33). Several authors (e.g., Dayton and Schafer, 1973, 
and Schafer, 1992) have recommended the universal use of the Bonferroni adjustment for 
controlling familywise error rate in multiple comparisons as well as tests of correlations and 
proportions because the procedure is simple and requires no restrictions on the nature of the 
dependence of the tests. Moreover, comparing to other more difficult and restrictive methods, 
the loss of the statistical power due to the Bonferroni adjustment is minimal. 

Preplanned Multiple Comparison of Means 

Suppose the population values for all group means and variances are unknown. A 

researcher would like to test simultaneously h hypotheses, (h < ^, on the contrasts = 0 

(d = 1, 2, ..., h), where ^=l,-lor0(j_=l,2, ..., ^ = the jth contrast weight such that 

= 0, and E = the sum over k groups. Suppose the researcher is also interested in knowing the 

range of pj ([ = 1, ..., ^. In other words, the researcher wishes to test the following 

hypotheses: 

(k within-sample tests): Hoj: Pj = Po vs. Haj: Pj Po, i = 1, 2, ..., k; where Po can be 
any value of interest. 

(h across-sample tests): Ho(,: 2^,dPj,d = 0 versus HA,d: 2Bj,dPj,d 0. d = 1,2, ..., h, where 
dPj d is the dth contrast involving m group means with nonzero contrast coefficients. 
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Each of the k within-sample tests is conducted the same way as discussed above for 
the one-sample tests of means. Given that the groups are independent, there is no familywise 
error rate to be protected. On the other hand, the procedure for the h across-sample tests is 
not so simple. Let us first consider the case in which the assumption of variance homogeneity 
is tenable across all k groups. The 100(1 - a)% two-tailed confidence interval for the dth 
contrast is expressed as 



for d = 1,2, ..., h, where Ixl = the absolute value of x, SE^ = Vf MSEE (aj.,Vnj)}. N = n, + ... 
+ = the total sample size, f = N - k = the degree of freedom of the t statistic, oc/2h = the 

familywise Type I error rate according to the Dunn-Bonferroni adjustment for h simultaneous 
hypotheses, and MSE is given in Equation 15. 

For each contrast in the h across-sample tests, there are m within-sample tests, (m < 
1^, corresponding to the m nonzero contrast coefficients among i = 1,2, ..., k. Suppose Pj 
(i = 1, ..., ^ is included in the specification of the dth contrast. The correction factor for 
computing the comparable confidence interval for pj is of the form 



where SE^ = Vf MSE (a, .,Vn, + ... + a^/Zr^)}, (k - m) coefficients of ay are zero, and SE^ = 
S,/Vn, + ... + ^/Vr^. Then the 100(1 - a)% two-tailed comparable confidence interval for pj j, 
i = 1, ..., m, is given by 



(18) 




1 - a/2hlit=:d> 






Cf,d = SEJSE^ = V rMSES (ay .Vnyll/XV f S^ny } . 



( 19 ) 
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where SE* ; = c^ hSE; . The associated comparative significance level is computed as 

= 2[1 - Tf(^tf 1 . „/2h)]. 

where T/x) = Pr(X < 2O = the probability of obtaining a value of x for the t^ j . ^2h 
distribution. 

In applying the above procedure under variance heterogeneity, one need to modify the 

degree of freedom for the t statistic. It is the solution for the equation f* = E(iij - 1)[1 - 

rWjAV)1. where ^ = n/S^j and W = ZWj (according to the Welch-Aspin approximation. See 

Marascuilo and Serlin, 1988, pp. 435-437). In evaluating the comparable confidence intervals 
per contrast under both variance conditions, directional two-tailed hypotheses can be assessed 
just as in the case of simple tests of means difference. 

Example 5 

In a certain study on the influence of professional training on attitudes toward persons 
with disabilities, 71 subjects were randomly assigned into six experimental conditions (n^ = 

11, rig = 12, lie = 10, ri[3 = 14, He = 11, rip = 13), which were subsequently combined into four 
treatment groups; with n| =11^-1- ng, ri2 = ne + n[3, ri3 = nE and li, = rip. Responses on the 
variable "Bias themes"(2Q were measured (e.g., as defined in Kemp and Mallinckrodt, 1996). 
Before data collection, the researcher was interested in testing four contrasts: p, vs. P2> Pi vs. 
Pj, p, vs. P4, and pj vs. P4. The mean square error (MSE) for X computed for the four groups 
is found to be 0.459. The assumption of variance homogeneity is tenable for the four 
contrasts (p > .26, £ > .47, £ > .19 and £ > .25, respectively) according to a test of variance 
homogeneity (Hinkle, Wiersma & Jurs, 1994, pp. 242-244; Ferguson & Takane, 1989, pp. 
202-204). The results of the procedure for testing these contrasts are summarized in Table 3. 
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Insert Table 3 about here 

In Panel 1 of Table 3, the sample means and standard deviations are computed 
according to Equations 16 and 17, respectively. The conventional t tests, with degrees of 
freedom fj = Uj - 1 (i = 1, 2, 3, 4) and the 100(1 - a)% two-tailed conventional confidence 
intervals are computed with the critical values of t^ , where a = .05. All of these 
individual confidence intervals overlap to each other. 

In Panel 2, the t tests of means difference are conducted with degrees of freedom f^ = 
N-k = 71 - 4 = 67, and evaluated at a* = ot/h = .05/4 = .0125 (two-tailed). The critical value 
of the test statistic according to the Bonferroni-Dunn approximation is 17, 05/4 = 2.567 (two- 
tailed) (obtained by extrapolation from Table 1, Appendix t’, Howell, 1997, p. 687; or by the 
command TINV(. 0125/2, 67) in the SAS computer program). For a numerical illustration, let 
us consider the evaluation of the third contrast, G1 vs. G4. The contrast coefficients are 
specified as a, = 1, ^ = 0, ^ = 0, and ^ = -1. Hence, upon diving the contrast mean 
= 1.54 - 1.74 = - .20) by its standard error (SE^j = Vl MSES aj m^nj) = 

V{.0438[( 1)^23 -t- (-1)^13]}) = .0726, one obtains the test statistic t<]3 = - .20/.0726 = - 2.711. 
The corresponding 95% confidence interval according to Dunn-Bonferoni approximation for 
the contrast is equal to 

0,3: I %d3Mj,d3 \±k I -a/2h^3 = I -2-71 I ± (2.567)(.0726) = (-.383, -.010), 

implying that the third contrast is statistically significant at a = .05 (two-tailed). Since SE. = 
EV{S7iij} = V{(0.23)V23 -t- (0.20)^24 -t- (0.22)^11 -i- (0.18)^13} = .104, the correction factor 
is given by 
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Cf,d 3 = ^ 3 /^ = .0726/. 104 = .698 

For the individual standard errors of SB, = 0.23/V23 = .048 and SE^ = 0.18/Vl3 = .050, their 
corrected values are SB*, h-» = Cf ^^SE, = (.698)(.048) = .033 and ^*443 = Cf^ ^SE^ = (.698)(.050) 
= .035. Hence, in Panel 3, the 95% comparable confidence intervals for p, and P 4 in the third 
contrast are 

CI*, 43 : M, ± tf= 7 i, 1 -a/ 2 hSE *,43 = 1.54 ± (2.567)(.033) = (1.46, 1.63), 

and 

£ 1 * 443 : M 4 ± tf= 7 i. 1 -a/ 2 hSE *443 = 1.74 ± (2.567)(.035) = (1.65, 1.83), 

respectively. As expected, the two comparable confidence intervals are separable. Moreover, 
at the risk of Type III error less than a*/2 =.0125, one can assume that p, < P 4 . The 
corresponding comparative Type I error rate is 

a’ = 2[1 - 698(2.567)}] = 2[1 - .961] = .078, 

implying that the 92.2% conventional confidence intervals for the means p, and P 4 are 
separable. 

Comments 

The above method is applicable to a-priori multiple comparisons of contrasts with 
different weights and symmetric critical regions. Extensions to the asymmetric critical regions 
are being studied. However, tables for Dunn-Bonferroni adjustment for t tests with unequal 
allocation of Type I error rates are available (Dayton and Schaffer, 1973). 
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Discussions and Conclusions 

The paper has begun with a discussion on the logic and motivation for undertaking a 
study of comparable confidence intervals for two-sample means tests. The proposed 
procedures were then generalized in terms of parameter type, variance conditions and tailed 
error allocation for critical regions. Procedures for nondirectional and directional two-tailed 
tests for a simultaneous evaluation of within-sample parameter estimation and across-sample 
tests of parameter discrepancy for pre-planned simple and multiple comparisons were studied. 
It was suggested that for such a simultaneous evaluation, one need only to compute and 
analyze the comparable confidence intervals per parameter pair. Thus if the comparable 
confidence intervals are separable, one can assume that the pair of parameter estimates are 
statistically significant, proceeds to determine the confidence limits for such a difference to 
happen and makes prediction on the direction of the parameter difference with the risk of 
getting the direction wrong less than half the nominal significance level (with Dunn- 
Bonferroni adjustment in the case of multiple comparisons). 

Researchers sometimes attempt to cover all bases by conducting a significance test 
and, when results are significant, calculating a confidence interval to estimate the parameter 
range. Unfortunately, making the decision to estimate parameter size contingent on the 
outcome of the significance test produces a biased estimate of the parameter (Schmidt, 1992) 
and loss of statistical power (Bancroft, 1944, Bennett, 1952). It is important to emphasize that 
the proposed procedures are developed for a-priori pairwise or multiple comparisons. 
Therefore, neither the construction nor the statistical power of comparable confidence 
intervals are influenced by the knowledge of the significance test outcomes. Because of these 
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concerns, comparable confidence intervals are not recommended for post-hoc pairwise 
comparisons. 

A question of interest is how the power of comparable confidence intervals can be 
determined. The within-sample tests for (t);, and the across-sample tests for (|)i (t);., = 1, 

2, ..., k, (k > 2), may have different statistical powers. Depending on the disparity in sample 
sizes and variances as well as the magnitude of (IIq, the within-sample tests could be less (or 
more) powerful than the across-sample tests for the same parameter pair. For example, in 
two-sample studies, if both (|), and (|)2 are small but substantially different and (|)o is set at 
zero, it is likely that the between-sample test will lead to the rejection of the null hypothesis 
when it is false more often than the between-sample tests will at the same a level. However, 
the difference in power of within-sample and between-sample tests does not influence the 
separability of comparable confidence intervals. Conceptually, as far as the decisions on 
parameter discrepancy (or separability) and predicted direction of the disparity, the powers of 
comparable confidence intervals are akin to the powers of the across-sample tests. 

As another point of clarification, in the procedures of a single pairwise comparison for 
two-sample studies, the pre-determined a level in the simultaneous evaluation of the within- 
sample and between-sample tests by two comparable confidence intervals does not require 
any adjustment of the type required for familywise error rate. Under each of the within- 
sample and between-sample tests in the proposed procedures, the chosen a represents the 
error rate per comparison (Howell, 1997, p. 349). When the two comparable confidence 
intervals are evaluated, a becomes the error rate per experiment (Howell, 1997, p. 349) for an 
experiment in which only one test has been conducted, namely the between-sample test. Of 
course, the adjustment for controlling familywise error rate is needed in a-priori multiple 
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comparisons. 

For most practical research settings, it is believed that the proposed methods are both 
effective and meaningful. Instead of conducting three significance tests (Hqi, i = 1, 2 and Hh), 
only two comparable confidence intervals are required. The saving can be more substantial 
with respect to a simultaneous testing of individual hypotheses and multiple comparisons. 

With regards to the interpretation issue, one may question about the appropriateness of using 
interval separability to make decisions on statistical significance. We recognize that the 
separability of two comparable confidence intervals and the statistical significance of a test 
for parameter discrepancy may have different meanings to researchers. However, confidence 
intervals posses the same mathematical properties of significance tests, and more. A 
confidence coefficient represents the probability of producing an interval containing the true 
value of the parameter of interest. One declaring that the two confidence intervals are 
disjointed, with specified confidence coefficient and interval bounds, when the influence of 
variations in sample size and sample variances have been accounted for, conveys a clearer 
message about the magnitude and nature of difference for the two sample estimates than 
saying that their difference is "significant". This is because the probability that the test 
statistic would take a value as extreme or more extreme than actually observed is smaller than 
OC/2 does not depict the whole picture of the difference. The confidence widths actually 
estimate the relative sizes of the individual effects. The measure of separability (or 
overlapping) of the two comparable confidence intervals indicates the size of their 
discrepancy (or similarity) or an estimate of the effect size of the difference scores. 

Comparing to the conventional confidence intervals, the proposed method yields 
confidence intervals with narrower widths, overlapping bounds for insignificant means 
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differences and separable limits for statistically significant results. These properties are 
confirmed in the examples under consideration. The proposed procedure is particularly useful 
whenever it is meaningful to evaluate confidence intervals for both parameter estimation and 
hypothesis testing. For example, in the calibration of IQ scores, psychometricians may want 
to determine the score range of IQ groups such that the means difference between any pair of 
adjacent groups will be statistically significant. Psychiatrists are also interested in testing the 
difference of Verbal IQ score (VIQ) against Performance IQ score (PIQ) as well as the 
individual ranges of VIQ and PIQ that the potential patients may belong to. As another area 
of potential application, studies for decision-making purposes generally require both statistical 
estimation and statistical inference, as opposed to exploratory studies which are mainly based 
on hypothesis testing for explanation purposes. For example, in clinical research, explanatory 
trials are conducted to determine whether a difference in treatments exists at all whereas the 
more sophisticated management trials is aimed at not only comparing treatment means but 
also deciding which treatment are better or should be used (Willan, 1994). 

Cox and Hinkley (1974) considered interval estimation as "the central problem of 
statistical inference" and Cox (1977) concluded "that therefore estimation, at least roughly, of 
the magnitude of effects is in general essential regardless of whether statistically significant 
departure from the null hypothesis is achieved" (p.70). Tukey (1991) discussed four 
compelling reasons about the importance of confidence intervals and maintained, "It should be 
clear (a) that confidence intervals are irreplaceable and (b) why they are needed" (p. 102). 

The proposed methods serves to illustrate, and advance, the utility of the confidence interval 
approach in supporting these arguments. 
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Footnotes 

' Researchers believing that the magnitude of effect size to be irrelevant as long as the 
effect is either statistically or practically important may not be interested in the comparison of 
confidence interval widths (Parker, 1995; Prentice and Miller, 1992). However, our concern is 
on the waste of one-sample information if one only conducts the evaluation of the difference 
scores. 

^ The preference for confidence intervals has been echoed through diverse disciplines 
(e.g., in the behavioral sciences by Carver, 1978, 1993; Cohen, 1994; LaForge, 1967; 

Schmidt, 1996; Serlin, 1993 and Shaffer, 1995; engineering by Hahn, 1974; Hahn and 
Meeker, 1991; Hsu, 1996; and Natrella, 1960; and medical studies by Borenstein, 1994; 
Gardner and Altman, 1986; Langman, 1986; Poole, 1987; Rothman, 1978, 1986; and 
Thompson, 1987. In fact, this list is severely incomplete). See also the comments on Cohen 
(1994) by Baril and Cannon (1995), Cohen (1995), Frick (1995), Hubbard (1995), McGraw 
(1995), Parker (1995), and Svyantek and Ekeberg (1995). For a critical view on the role of 
confidence intervals in hypothesis testing, see Cortina and Dunlap (1997). 

^ A full treatment of using confidence intervals for multiple comparisons is found in 
Hsu (1996). 

For the test of Hoj, the test statistic is equal to 2.1126, corresponding to a p-value of 
ttp = 2*(1 - 0(2.1126)) = .0346 and the corresponding significant confidence interval is ^ = 
( 0 , 11 ). 

^ If the test statistic for a conventional two-tailed hypothesis is not significant at oc/2 
then the Type III error for the corresponding two-tailed directional test is zero or undefined. 
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and if the test statistic is significant at Oc/2, as the parameter discrepancy (5), and power of 
the test, decreases to zero, the Type III error increases to its maximum value of a/2. On the 
contrary, the Type III error becomes infinitesimal as power increases to 1 
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Table 1 

Correction Factors in Computing Confidence Intervals for Two-sample Tests of Means (u). 
Proportions ( k ) and Correlation Coefficients (o) 



Case 


Null 

Hypothesis 


Sampling 

Distribution* 


Distribution 

Conditions 




Correction 

Factor 


la 


H„: p, - p2 = 0 


Z 


cf*i = cf*2 


£f = 


^J{1 + m} 










1 + Vim 


lb 


H„: p, - p2 = 0 


Z 


0^1 / 0^2 


£f = 


V{u + m} 










V { u + Vm 


Ic 


H„: p, - P2 = 0 


t(D 


cf*i = cf*2 


£f = 


V{(1 + m)(l + £v)} 








V{1 + £}(Vm + 


Id 


Hq: Pi - P2 = 0 


t(f) 


0^1 / 0^2 


Cf = 


V{v + m} 








Vv + Vm 


2 


({q. TCj “ 7 C 2 ~ 0 


z 


JLPi > 5 


Cf = 


mS^’l 








Jl(l - Pi) > 5 






3 


^0* Pi “ P2 “ ^ 


z 




£f = 


V{(n,+3) + (112+3)} 










V{n,+3} + V{ri 2 + 3} 



Note , m = n^/n,, u = 1 = SVS^. f = fi + ^ 2 , £ = J 1 - 1 for i = 1, 2, f = [(wVf,) + ((1 - w = 

S^,/n,£ and £ = (S^,Ai,) + (SVlh)- .S^’= (fi’O - £’)}. £’ = (n.i£i + n,p,) /N, and N = £i + r^. 




' Sampling distribution of the test statistics. 
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Table 2 



Confidence Coefficients of Conventional and Comparable Confidence Intervals Corresponding 



to Selected Values of Type I Error and Correction Factor 



Panel 1. Z tests 



Correction factor (c^) 





Cl* 


Zp 


0.70 


0.71 


0.72 


0.73 


0.74 


0.75 


0.76 


0.77 


0.78 


0.79 


0.80 


0001 


.999 


3.891 


0.994 


0.994 


0.995 


0.996 


0.996 


0.997 


0.997 


0.997 


0.998 


0.998 


0.998 


01 


.99 


2.576 


0.929 


0.933 


0.936 


0.940 


0.943 


0.947 


0.950 


0.953 


0.955 


0.958 


0.961 


05 


.95 


1.960 


0.830 


0.836 


0.842 


0.848 


0.853 


0.858 


0.864 


0.869 


0.874 


0.878 


0.883 


10 


.90 


1.645 


0.750 


0.757 


0.764 


0.770 


0.776 


0.783 


0.789 


0.795 


0.801 


0.806 


0.812 


20 


.80 


1.282 


0.630 


0.637 


0.644 


0.651 


0.657 


0.664 


0.670 


0.676 


0.685 


0.689 


0.695 



Panel 2. t Tests 



Correction factor (c^) 



a 


Cl* 


f 


Tp 


0.70 


0.71 


0.72 


0.73 


0.74 


0.75 


0.76 


0.77 


0.78 


0.79 


0.80 


.01 


.99 


10 


3.169 


0.949 


0.952 


0.955 


0.957 


0.959 


0.961 


0.963 


0.965 


0.967 


0.969 


0.970 






20 


2.845 


0.940 


0.943 


0.946 


0.949 


0.952 


0.955 


0.957 


0.960 


0.962 


0.964 


0.966 






40 


2.704 


0.934 


0.938 


0.941 


0.945 


0.948 


0.951 


0.954 


0.956 


0.959 


0.962 


0.963 






80 


2.639 


0.932 


0.935 


0.939 


0.942 


0.946 


0.949 


0.952 


0.955 


0.957 


0.960 


0.962 






160 


2.607 


0.930 


0.934 


0.938 


0.941 


0.944 


0.948 


0.951 


0.954 


0.956 


0.959 


0.962 


.05 


.95 


10 


2.228 


0.850 


0.855 


0.860 


0.865 


0.870 


0.874 


0.879 


0.883 


0.887 


0.891 


0.895 






20 


2.086 


0.840 


0.846 


0.851 


0.857 


0.862 


0.867 


0.871 


0.876 


0.881 


0.885 


0.889 






40 


2.021 


0.835 


0.841 


0.847 


0.852 


0.857 


0.863 


0.868 


0.872 


0.877 


0.882 


0.886 






80 


1.991 


0.833 


0.838 


0.844 


0.850 


0.855 


0.861 


0.866 


0.871 


0.875 


0.880 


0.885 






160 


1.975 


0.832 


0.837 


0.843 


0.849 


0.854 


0.859 


0.865 


0.870 


0.875 


0.879 


0.884 


.10 


.90 


10 


1.812 


0.767 


0.773 


0.779 


0.785 


0.790 


0.796 


0.802 


0.807 


0.812 


0.817 


0.822 






20 


1.725 


0.759 


0.765 


0.771 


0.777 


0.784 


0.789 


0.795 


0.801 


0.806 


0.812 


0.817 






40 


1.684 


0.755 


0.762 


0.768 


0.774 


0.780 


0.786 


0.792 


0.798 


0.803 


0.809 


0.814 






80 


1.664 


0.752 


0.759 


0.766 


0.772 


0.778 


0.785 


0.790 


0.796 


0.802 


0.808 


0.813 






160 


1.654 


0.751 


0.758 


0.765 


0.771 


0.777 


0.784 


0.790 


0.795 


0.801 


0.807 


0.812 


.20 


.80 


10 


1.372 


0.641 


0.647 


0.654 


0.660 


0.666 


0.672 


0.678 


0.684 


0.690 


0.696 


0.702 






20 


1.325 


0.635 


0.642 


0.649 


0.655 


0.662 


0.668 


0.674 


0.680 


0.686 


0.692 


0.698 






40 


1.303 


0.633 


0.640 


0.646 


0.653 


0.659 


0.666 


0.672 


0.678 


0.684 


0.691 


0.697 






80 


1.292 


0.632 


0.638 


0.645 


0.652 


0.658 


0.665 


0.671 


0.677 


0.683 


0.690 


0.696 






160 


1.287 


0.631 


0.638 


0.644 


0.651 


0.658 


0.664 


0.670 


0.677 


0.683 


0.689 


0.695 



Note , a = Nominal Type I error, Cl * = Confidence coefficient of the comparable confidence 
interval, ^ = Critical value of the Z statistic for two-tail test at a level, tp = Critical value 
of the t statistic of degree of freedom f for two-tail test at a level under the assumption of 
variance homogeneity, and f = n - 1 = degree of freedom for t tests. 
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Appendix 

DERIVATION OF THE FORMULAS IN TABLE 1 

For each procedure listed below, the formulas related to two-tailed tests under 
unbalanced designs (iL n^, i i = 1 , ^ are given for the following substeps; (i) 

confidence interval for the difference scores (CL ); (ii) one-sample comparable confidence 
intervals (Cl*); and (iii) comparative Type I error (a’). For one-tailed tests, replace (X /2 by a 
and a’ by aV 2 in the formulas. Based on theses results, formulas under balanced designs are 
trivial. Some of the notations that will be used repeatedly are; m = 112/ni, u = aVct^i. X = 
SVS^i. f = fi + 12> £ = Ili ■ 1 for i = 1,2, 0(20 and XifOO represent the cumulative distribution 
functions (CDF) of standard normal, and t with degrees of freedom df, respectively, evaluated 
at X. 



Case 1 . Simple Comparisons of Means 

la. Variance homogeneity (a, = = a, known) 

(i) CIj,; (M, - M2) + Z, . (x/2§Md> where SE., = cW{(n, + n2)/nin2}- 

(ii) Q*j; Mi ± Z, . „ nSE *;, where SE* ; = c,SE; . SE; = a/Vn^, i = 1 , 2 , and 
Of = V{n, + iVjl/CVn, + Vn,) = V{ 1 + m}[l + Vm]'‘. 

(iii) a’ = 2 [l- 0 (V{CfZ,.^})]. 

lb. Variance heterogeneity (a, ^ ao. known) 

(i) CIj,; (M, - Mo) + Z, „ nSE., . where SE., = V{(a^,/n,) +(<7^1/112) }■ 
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(ii) CiV Mi ± Z, . o, 2 SE*i, where Sg-, = egg, Sg = a/Va, i = 1, 2, 

and Cf = V{ri 2 CJ^i + + Ct 2 ''^Ili] ' = V{m + u}[Vm + Vu]''. 

(iii) a' =2[l-0(^[c,Z,.^,})l 

l c. Variance homogeneity (o, = 0^ = 0, unknown) 

(i) (M, - Mo ) + , . ^ nSEn . where SE^ = cW {2/n}, f = f i + I 2 and £ = n; - 1, 

i= 1,2. 

(ii) Q*j: M ± ki - where SE* ; = CfSE . SE; = ^/Vn^, i = 1,2, and 

Cf = V{(1 + gu)(l + m)[l + £]'‘}/[Vm + = V{(1 + m)(l + gv)}[V{ 1 + £}(Vm + Vy)]'' 

where £ = fj/fi- 

(iii) oc = 2[1 - Tf(V{^j^ 1 .a/ 2 })]- 

l d. Variance heterogeneity (a, ^ a-,, unknown) 

(i) CIj: (Mi - M 2 ) ± tf.i - 0 / 2 .^, where ^ = V{(S,Vn,) + (^Vnj)}, and 

(^i^ + ^ 2 ^)^ 

f = =[wVf, +(l-wVf 2 ]-‘, 

+ 

Hi - 1 H 2 - 1 

(Satterwaite, 1946; Welch, 1938), where i = 1, 2, w = and £ = (S^j/n,) + 

(sVib). 
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(ii) Cl’j: Mi ± If*, I - where SE* ; = CfSE , = [Vw + V{ 1 - w}]‘‘. Since w = 

m/(m + v), the correction factor can be written as c^^ = V{m + v}[Vm +Vv]-‘. 

(iii) a* = 2[l -T^(V{c,t^,,.^})]. 

Case 2. Simple Comparisons of Proportions 

In testing the null hypothesis: Ho j: Ttj = Ttg, the application of the following procedure 
requires that 2 ^^ > 5 and n^Cl - pj) > 5 (i = 1, 2), where Ttj and are the population and 
sample proportions of the ith population, respectively. 

(i) Ci: Pi + where ^ = S/Vrii and S; = V{pi(l -Pi)},i= 1,2. 

(ii) a^: (p, - Po) + Z,.„nSEH, where ^ = S’V{N/n,n 2 }, S’ = V{p’(l - p’)}, 

P’ = (piP, + n,p,) /N, and N = n, + P 2 . 

(iii) CiV Pi ± Z, . „nSE*„ where = c^^, i = 1, 2, ^ = S/Vrii, Sj = V{pi(l - Pi)}, 
and c, = S’ VNrS ,Vm + ^Vn,]'' = V{S^’+ mS^’ }[Vs ^2 + V{mS^}]■'• 

(iv) a* = 2[l -<D(Vc,pZ,.^ 2 )]. 

Case 3. Simple Comparisons of Correlations 

For statistical inference regarding p and Pi, where p denotes the population correlation 
coefficient, the following procedure requires the computation of Z,;, the Fisher’s Z 
transformation of p, i = 1,2, where p represents the sample Pearson correlation of the ith 
population. 

(al) Cli for Z,i + Zi.<^ 2 ^ = (^|_l, Z,iu), where ^ = [V{nj + 3}]‘‘, i = 1,2; 
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and Zfi u are the lower and upper bounds of this confidence interval, respectively. They can be 
converted into the raw score units of measurement as (TjL, 1 ;^): ([1 - exp{-2^jL}][l + exp{- 
2Z,,l}]-‘, [1 - exp{-2^i,u}][l + exp{-2^,^j}]-‘). 

(i) The 100(1 - a)% two-tailed confidence interval for -Z,2- 

Qd- (Zui ~ Zut) i Zli - = (Zid,L> ^d,u)> 

where SE^ = ^{(Sh + 3)'* (ri 2 3)'*}. 

The 100(1 - a)% two-tailed confidence interval for Pj - P 2 : 

CE ' ([1 - exp{-2Z...ji 11 ri + 6 xp{-2^jl}] , [1 - exp{-2^j[j}][l + exp{-2^j[j}] ). 

(ii) The 100(1 - a)% two-tailed comparable confidence interval for -Zp,: 

CIV ± Z. . = (^,L. 

where SE* ; = c,SEi . i = 1,2, and 



V{nl +n2-6} 

Cf = ■ 

V{n, - 3} + V{ri 2 - 3} 

The 100(1 - a)% two-tailed comparable confidence interval for Pi - P 2 : 
aV ([1 - exp{-2Z*,;, 1/ri + exp{-2^iL}. [1 - exp{-2^iu}]/[l -l- exp{-2^ju}]) 
(hi) a = 2[1 - 0(Vcf^i .^ 2 )]- 
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